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MEASURE CONCENTRATION FOR EUCLIDEAN DISTANCE IN 
THE CASE OF DEPENDENT RANDOM VARIABLES 1 

By Katalin Marton 

Hungarian Academy of Sciences 

Let q n be a continuous density function in n- dimensional Eu- 
clidean space. We think of q n as the density function of some random 
sequence X n with values in R". For I C [l,n], let Xi denote the col- 
lection of coordinates Xi, i S I, and let Xi denote the collection of 
coordinates Xi, ifil. We denote by Qi(xi\xi) the joint conditional 
density function of Xi, given Xi. We prove measure concentration 
for q n in the case when, for an appropriate class of sets I, (i) the 
conditional densities Qi(xi\xi), as functions of xi, uniformly satisfy 
a logarithmic Sobolev inequality and (ii) these conditional densities 
also satisfy a contractivity condition related to Dobrushin and Shlos- 
man's strong mixing condition. 

1. Introduction. Let us consider the absolutely continuous probability 
measures in n-dimensional Euclidean space W 1 . With some abuse of nota- 
tion, we use the same letter to denote a probability measure and its den- 
sity function. 

We say that a measure q n on lR n has the measure concentration property 
(with respect to the Euclidean distance) if 



1.1) d(A,B)<c- 



q n (A) V to q n (B) 



for any sets A,B (Z 



where d{A, B) denotes the Euclidean distance of the sets A and B. (We con- 
sider only measurable sets. This definition is equivalent to the more familiar 
definition that involves the probabilities of a set A and its e neighborhood.) 

Measure concentration is an important property, since it implies sub- 
Gaussian behavior of the Laplace transforms of Lipschitz functions and 
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thereby is an important tool for proving strong forms of the law of large num- 
bers. 

Measure concentration for q n follows from the validity of a logarithmic 
Sobolev inequality for q n by a recent theorem of Otto and Villani (2000). 
However, in this paper we prove measure concentration in some cases when 
a logarithmic Sobolev inequality probably cannot be proved by an easy 
extension of the existing methods. 

Consider the following the distance between probability measures in M n : 

W(p n ,q n ) = inf [EJY n - X n ) 2 ] 1/2 , 

7T 

where Y n and X n are random variables distributed according to the law 
p n and q n , respectively, and the infimum is taken over all distributions ir on 
R n x R n that have p n and q n as marginals. This is one of the transporta- 
tion cost related distances between measures, often called the Wasserstein 
distance (based on the squared Euclidean distance). 

Let us denote by D{p n \\q n ) the informational divergence of the probability 
distribution p n with respect to q n as 

dp r > 



D(p n \\q n )= I log-^dp" 
Jr" dq n 



dq r 

if p n is absolutely continuous with respect to q n , and oo otherwise. (By 
dp n /dq n we denote the Radon-Nikodym derivative.) 

By a simple argument [first used by Marton (1986, 1996) for Hamming dis- 
tance and then by Talagrand (1996) for Euclidean distance] it can be shown 
that if q n satisfies, for some p > 0, the "distance-divergence" inequality 



(1.2) W(p n , q n ) < s j 2D (P n \\l n ) for any probability measure p n on R n , 

then it satisfies the measure concentration inequality (1.1) as well (with 
c= yj2/p). Indeed, assume (1.2) and let A,Bc M. n be measurable sets in W 1 . 
Denote by p n and r n the restriction of q n to A and B, respectively: 

P {C} - q n (A) ' r (C) - q-(B) ■ 

Since p n and r n are supported by A and B, respectively, we have, using also 
the triangle inequality for W, 

d{A, B) < W(p n , r n ) < W(p n , q n ) + W(r n , q n ) 



< 2D(p n \\q n ) 2D(r n \\q n ) 



Since 



D(p n \\q n ) = log— — and D{r n \\q n ) = log 



q n (A) y ' q n (B) 
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(1.1) follows. 

Therefore, our aim is to find possibly general sufficient conditions for a 
measure q n to satisfy a distance-divergence inequality (1.2). 

The inequality (1.2) was first proved by Talagrand (1996) for the case 
when q n is a Gaussian product measure, and his proof for dimension 1 easily 
generalizes to the case when q 1 is uniformly log-concave. [Second derivative 
of — \ogq 1 (x) bounded from below.] With more effort, using recent results 
on the solution of the Monge-Kantorovich problem for the Wasserstein dis- 
tance in the Euclidean space [McCann (1995)], this generalization can be car- 
ried out for the multidimensional case as well. For an alternative proof, see 
Bobkov and G6tze (1999). Otto and Villani (2000) proved that the distance- 
divergence inequality (1.2) follows from q n satisfying a logarithmic Sobolev 
inequality, that is, from 



holding for any density function p n on K n such that p n /q n is smooth enough. 
A simple sufficient condition for q n satisfying a logarithmic Sobolev inequal- 
ity is that q n be a bounded perturbation of a uniformly log-concave function. 
(See later.) 

Much effort has been spent to find sufficient conditions for the logarithmic 
Sobolev inequality in terms of the conditional density functions 



of q n . [Here Xi denotes (X& :k^i).] This problem is not yet satisfactorily 
solved. In the cases considered, q n is a Gibbs state (with unbounded spins) 
over a region A of the <i-dimensional integer lattice and q n corresponds to a 
pair interaction with bounded range. [See Yoshida (1999a, b), Helffer (1999), 
Bodineau and Helffer (1999) and Ledoux (1999).] 

In this paper we use a different approach: To prove distance-divergence 
inequality for q n , we use the one-dimensional distance-divergence inequality 
for the conditional distributions Qi(-\xi). 

Notation. The integers i, 1 < i < n, are called sites and [l,n] is the 
set of sites. Let I be a family of sets / C [l,n], called patches. Each I El 
has a multiplicity > 1 and the number of patches counted with multiplicities 
is denoted by A. A patch consisting of one element i is denoted by i: for 
x n G R n and J C [1, n], xj = (xi : i G J) and xj = (xj :i ^ J); for a n G R n and 
J C [l,n], |a/| 2 = £ieJ°?- 

Let q n denote the density of an absolutely continuous probability measure 
on W l and let X n denote a random sequence in W 1 , dist X n = q n . Conditional 
density functions consistent with q n are expressed as Qi(-\xi) = dist(Xj\Xj = xi) 




Qi(-\xi) = dist g n(Xj|Xj = Xi) 



4 



K. MARTON 



(I El), whereas qi(xj) (I G X) denotes the density function of Xj. The den- 
sity of a probability distribution on W 1 is denoted by p n and Y n denotes a 
random sequence with dist Y n = p n . Conditional density functions consistent 
with p n are expressed pi(-\xj) = dist(Y/|Y/ = xi) and the density function 
of Yj is denoted pi(yi) (I G T). 

Let W/(p) represent the set of all probability distributions Qi on M. 1 that 
satisfy, for every distribution pj on M^, the distance-divergence inequality 



In the simplest case, I consists of the one-element sets of [l,n]. Alterna- 
tively, let n be the cardinality of a (large) box A in the d-dimensional lattice 
Z d , let V C 1* d be a (relatively small) set and let I consist of the intersec- 
tions of the translates of V with A. The multiplicity of such a set I can be 
taken as the number of different translates of V whose intersection with A 
is /. Every site is covered then by |V^| patches, where \ V\ is the cardinality 



Theorem 1 presents a sufficient condition for a distance-divergence in- 
equality of type (1.2) in terms of the conditional distributions Qi(-\xi) 
The reason we want a condition in terms of the conditional dis- 
tributions Qi(-\xi) is that in statistical physics the model is often defined 
in such a manner that it gives direct information on these conditional dis- 
tributions. For example, q n may be the conditional distribution of a Gibbs 
random field over a domain in a multi-dimensional lattice with fixed bound- 
ary condition. 

The conditions of the theorem require that the individual conditional dis- 
tributions Qi(-\xi) behave nicely, and we also need the following assumption 
on the ensemble of the conditional distributions Qi(-\xi) (I 

Definition 1 (Contr activity condition). Let I be such that every site i 
is covered by at least t > 1 patches /. The system of conditional distribu- 
tions Qi(-\x~i) (I G T) is (1 — 5)-contractive (5 > 0) if for any pair of sequences 
(y n , z n ) G M. n xl", 



For clarity, we formulate the contractivity condition for the special case 
when I is the family of one-element patches. 

Contractivity condition for one-element patches. We say that 
the system of conditional distributions Qi(-\xi) is (1 — 5) -contractive (5 > 0) 




of V. 



(1.3) 



E w 2 (Qd-\yi), Qd-m <t.(l-8)\y n - z 



lei 
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if for any pair of sequences (y n , z n ) G M. n x IR n , 

n 

(1.3') Y, w ^QMm)M-\^)) < (i - £)|y n - ^ n l 2 • 

i=l 

The contractivity condition is related to Dobrushin and Shlosman's strong 
mixing condition. Indeed, it is obviously implied by the following condition: 

Definition 2 (Dobrushin-Shlosman-type contractivity condition). Let 
us assume again that every site i is covered by at least t > 1 patches /. We 
say that the system of conditional distributions Qi(-\xj) (I G I) satisfies 
a Dobrushin-Shlosman-type contractivity condition if for every I €l and 
k ^ I, and for every yj, xi differing only at site k, 

(1-4) W(Q I (-\x I ),Q I (-\y I ))<a kJ \y k -x k \, 

and for the matrix A = (a k ,l) ( a k,l = for k G / by definition) 

(1.3") \\Af<(l-S)-t. 

Here \\A\\ denotes the norm of A considered as an L2 *—> L2 operator and 6 is 
a positive constant. 

To see that Definition 2 is stronger than Definition 1, note that, by the 
triangle inequality, (1.4) implies 

W(Qi(-\yi),Qi(-\z!)) <Y, a k,l\zk ~ Vk\ for all I, y n ,z n , 
so, by the definition of ||A||, 

J2w 2 (Qi(-\yi),Qi(-\zi)) 

lei 

< E (H^M - Vk\] < \\M 2 \y n - * n \ 2 < t ■ (1 - 6)\y n - z n \ 2 . 

l€l\ k ) 

The stronger version of the contractivity condition given in Definition 2, 
when considered for one-element patches, is an analog of Dobrushin's (1970) 
uniqueness condition. In the general case, it is an analog of Dobrushin and 
Shlosman's (1985a) uniqueness condition (CV). Note, however, that we use 
a variant of the Wasserstein distance that minimizes the expected squared 
distance, whereas in condition (CV) in Dobrushin and Shlosman's (1985a) 
a form of the Wasserstein distance is used that minimizes the expected dis- 
tance without squaring. Moreover, we require (1.4) to hold for all patches 
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I €.1, which means, for the second example considered above, all the in- 
tersections of the translates of a given set with another given set. This is 
reminiscent of condition (CC) in Yoshida (1999b), who formulated a set of 
mixing conditions, one of which is (CC), that can be considered the analogs 
of Dobrushin and Shlosman's (1985b, 1987) strong mixing conditions. It is 
not completely clear how the contractivity condition used in this paper (Def- 
inition 1) is related to the set of mixing conditions in Yoshida (1999b). How- 
ever, we think that the conditions in Yoshida (1999b) should not be consid- 
ered final and standard yet, since their equivalence among each other and 
with the logarithmic Sobolev inequality is only proved for ferromagnetic in- 
teractions and in the case of superquadratic growth of the single spin phase. 
We think that the contractivity condition is understandable in itself, and 
we do not need an analysis of its analogy with the Dobrushin-Shlosman 
conditions, or the conditions in Yoshida (1999b). We note, however, that we 
assume nothing that would correspond to the boundedness of the "range 
of interaction." 

In this paper we use only Definition 1. Definition 2 is stated here only to 
explain the relationship with previously existing concepts. 

Theorem 1. Let I be such that every site i is covered by at least t > 
1 and by at most v patches. Assume that for all Qi(xi\x~i), as a 

function of n variables, is continuous. Assume further that for every I El 
and every xj, 



Finally, assume that the system of conditional distributions Qi(-\x[) (I £l) 
is (1 — 5) -contractive (5 > 0). Then for any distribution p n on M n , 



where C is a numerical constant. 

Formula (1.6) simplifies if v = t, as in the above examples. 
The conditions of Theorem 1 are quite abstract, so we are going to for- 
mulate a special case where the conditions can be verified. 
Write the density function q n in the form 



where Z is a normalizing constant. Then the conditional density functions 
are of the form 



(1.5) 



Qi(-\xi) €Wi(p). 



(1.6) 




(1.7) 



9 "(x") = i-exp(-*(x B )) 



(1.8) 



Ql(x!\xj) 



1 



•exp(-$(z n )) 



Z{xi) 
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where Z(xj) is the normalizing factor. 

When does the ensemble of the conditional densities Qi(xi\xi) satisfy the 
conditions of Theorem 1? 

It is natural to try to use the recent result by Otto and Villani that 
deduces the distance-divergence inequality for some probability measure q 
on R fc , that is, the relationship 



W(p, q )<^ 2D(p J lq) for all p, 

from q satisfying a logarithmic Sobolev inequality. 

Definition 3. The density function q on M fc satisfies a logarithmic 
Sobolev inequality with constant p if for any density function p on M fc , 
such that p(x k ) /q(x k ) is sufficiently smooth, 



D(p\\q) < f 
2p 



V71 d P 

Vlog — 
dq 



dp. 



The following sufficient condition follows from the Bakry-Emery (1985) 
criterion, supplemented by a perturbation result from Holley and Stroock (1987): 



Proposition 1. Letq(x k ) be a density function of the form exp[—V(x k )] 
and let V be strictly convex at oo, that is, V(x k ) = U(x k ) + K{x k ), where 
K(x k ) is bounded, and the Hessian 

D(x) = (dijU(x)) 

satisfies 

D(x) >c-I 

for some c> (where I is the identity matrix). Then q satisfies a logarithmic 
Sobolev inequality with constant p, depending only on c and \\K\loo: 

p>c- exp(-4||i ; C|| 00 ). 

Note that, on the real line, a necessary and sufficient condition for a 
density function to satisfy a logarithmic Sobolev inequality was established 
by Bobkov and Gotze. From this result, Gentil (2001) derived the logarithmic 
Sobolev inequality for a class of density functions, different from the above 
class. We do not cite this theorem. 

Theorem of Otto and Villani (2000) [see Bobkov, Gentil and Ledoux 
(2001) also]. If the density function q(x k ) (x k £ M, k ) satisfies a logarithmic 
Sobolev inequality with constant p, then q S W[i,k](p)- 
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[The theorem as stated here was proved by Bobkov, Gentil and Ledoux 
(2001); its original version from Otto and Villani (2000) contained some 
minor additional condition.] 

To formulate a sufficient condition for the contractivity condition we in- 
troduce some notation. 

Let T be a family of patches as in Theorem 1. Consider distribution (1.7) 
and assume that $ is twicely continuously differentiable. For a fixed se- 
quence y n G M n and a vector rj = (r)i,I Gl), where define a 
matrix B = B(r], y n ). The rows of B are indexed by pairs (I, i), (i Gl El), 
while its columns are indexed by k (1 < k < n), 

B = B( V ,y n ) = ((3 iI ^ k (r ] ,y n )), 

where 



P(i,i)A r iiy r ' 



o, i, ke I. 



For the case of one-element patches, the definition of B(r], y n ) = B(n n , y n ) 
becomes quite simple: 



B = B( v n , y n ) = (p i>k ( v n ,y n )), 



0, k = i. 



Note that if $ has the form 



<$>{x n ) = J2 V(xi) + J2 Kk*&k, 

i=l i^k 

then B does not depend on r] and y n . For example, in the case of one-element 
patches we have 

B = (b iik ). 



Theorem 2. Let I be a family of patches as in Theorem 1 and assume 
that is twice continuously differentiable. Assume furthermore that the con- 
ditional densities (1.8), as functions ofxi, satisfy a logarithmic Sobolev in- 
equality with the same p (independently of I and xj). If 



(1.9) 



1 



sup 

v,y n 



B{ii,y n ) 



<t-(l-5), 



then Theorem 1 holds. 
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In view of the Otto-Villani theorem, the question arises whether the con- 
ditions of Theorem 2 might imply a logarithmic Sobolev inequality. 

This is not to be expected with the existing proofs of logarithmic Sobolev 
inequality for Gibbs fields. Indeed, consider a Gibbs field over a cube AcZ d , 
with single spin space M and potential 

(1.10) ®{x n ) =J2V(xi) + ^ hj x i x j + Yl KjXi^ji 

jez d -A 

where {ujj :j 6 Z rf - A} is the configuration outside A. We assume that V(x) 
is convex at oo, that is, V(x) = U(x) + K(x), where U"(x) > c > and 
K(x) is bounded. If bij does not go to exponentially fast with \i — j\ — > 
oo, then the proofs of Yoshida and Bodineau-Helffer break down, whereas 
it is still possible that condition (1.9) of Theorem 2 holds. If bij = J > 
for i and j nearest neighbors, and 6jj = otherwise, then Yoshida's proof 
requires superquadratic growth for the single spin phase V(x) at oo; and for 
the Bodineau-Helffer proof to work, 2dJ must not approach p, whereas for 
condition (1.9) of Theorem 2 to hold, it is sufficient that 2dJ < p. 

On the other hand, Ledoux's (1999) proof of his Proposition 2.3 does 
apply for nearest neighbor interactions with interaction coefficient J > 
satisfying 2dJ < c - exp(— 4|| J K"|| 00 ) and proves the correlation bound (DS3) 
of Yoshida (1999b). By the results of Yoshida (1999b), this is equivalent to 
the logarithmic Sobolev inequality, provided V(x) grows superquadratically 
at oo. 

However, even if the interaction coefficients and the single spin phase are 
such that a logarithmic Sobolev inequality holds, that does not yield a simple 
explicit bound for the logarithmic Sobolev constant nor for the coefficient in 
the distance-divergence inequality. On the other hand, Theorem 2 implies 
the following corollary for potential (1.10): 

Corollary 1. If for the potential (1.10), the density function const. 
exp(— V(x)) satisfies a logarithmic Sobolev inequality with constant p and 
for the (infinite) matrix B = (bij), 



then 

W 2 (p n ,q n )<C 2 



\B\\<p, 

1 2£>(p n ||<f 
l-\\B/pf T~ 



Remark 1. It is not known whether the distance-divergence inequal- 
ity (1.2) implies a logarithmic Sobolev inequality (possibly with a different 



10 



K. MARTON 



constant [Villani (2003)] this would be a converse to the Otto-Villani the- 
orem). Thus it would be very interesting to prove or disprove that a loga- 
rithmic Sobolev inequality holds under the conditions of Theorem 2, with a 
constant depending on p and 5. 

The proof of Theorem 1 is based on a Markov chain (sometimes called 
the Gibbs sampler), which realizes a discrete time interpolation between p n 
and the Markov chain's invariant measure q n . The contractivity condition 
allows us to prove that this Markov chain converges to q n exponentially 
with respect to the Wasserstein distance. (See the end of Section 2.) Before 
coming to this step, we prove a bound for W 2 (p n ,p n F M ) (where p n T M is the 
distribution of the Mth term of the Markov chain) in terms of D(p n \\q n ). 

2. Some Markov kernels and probability distributions on W 1 . The fol- 
lowing Markov kernels, which are associated with the conditional density 
functions Qi(-\xi) (I El), are instrumental in our forthcoming construc- 
tions. 

For / 6 X define the Markov kernel (i.e., the conditional distribution) 
Tj(dz n \y n ) as follows. The projection of Tj(-\y n ) on the coordinates outside 
/ is defined as 

r I ({y I }\y n ) = i- 

The projection of Tj(-\y n ) on the coordinates in / is given by the conditional 
density Qi(-\yi): 

Ti{dzi\y n ) = Qi(zi\yj) dz r . 

We define the Markov kernel Y{dz n \y n ) as a mixture of the Markov ker- 
nels T I {-\y n ): 

r(dz n \y n ) = ±-Y. T ^ dzn \y n )- 

lei 

Finally, for an integer M > 0, we denote by T M (dz n \y n ) the Mth operator 
power of T(dz n \y n ): 

T M {dz n \y n ) 

r r(dz n \z n {M - l))T{dz n {M - l)\z n (M - 2)) • • • T{dz n {l)\y n ). 



Equivalently, 

T M (dz n \y r ' 



N M 



E JJ---Jr lM (dz n \z n (M-i)) 



h,l2,...,lM€X 



xF lM _ 1 (dz n (M-l)\z n (M-2)). 
xT h (dz n (l)\y n ). 
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Now, at the risk of redundancy, we give a somewhat lengthy description 
of the Markov kernel T M (dz n \y n ), along with some associated density func- 
tions, since it is this description that we use in the sequel. We keep M fixed. 

Let us fix a sequence of patches 

(2.1) (h,I 2 ,...,I M ). 

Also, fix a density function p n = disty™. We define successively the den- 
sity functions 

(2.2) r n (0)=p n , r n (l)=r n (l-l)T Ir Z = 1,2,...,M. 

We think of the density functions r n (l) as being conditional density func- 
tions of random sequences Z n (l), I = 1,2, ... ,M, given that in a random 
independent M-wise selection from the set X, we have drawn ■ ■ ■ ,Im- 

r n (l)=dist(Z n (l)\h,I 2 ,...,I M ). 

It follows from (2.2) that r n (l) does not depend on ij+i, . . . , Im> that is, 

r n (l) = dmt(Z n (l)\I 1 ,I 2 ,...,I l ) 

for every /. 

We also define a joint conditional distribution for (Z n (0) =Y n ,Z n (l), 
. . . , Z n (M)), given by (2.1). First we define, for every I, 

dist(Z n (l - l),Z n (l)\h,I 2 , . . .,I M ) = dist(Z n (/ - 1), Z n (l)\h,I 2 , ...,/,) 

in such a way that 

dist(Z /; (/)|Z n (Z - 1) = z n (l - l),h,h, ...,//) 

is concentrated on {zj^l — 1)} and, moreover, that 

dist(Z/, (l-l),Z Il (l)\Z Il (l-l) = z Il (l-l),h,I 2 ,...,Ii) 

minimizes, for each value of zi t (l — 1), the expected conditional quadratic 
distance 

EHZ^Q)- Z^l- 1)1^(1 -l),h,I 2 ,...,Ii}. 

At this point we use the condition (1.5) in Theorem 1 to infer that this 
minimization yields 

£{iz^)-^-i)i 2 i^-i)^,/2,...,/a 

(2.3) 

< - • D{r h (l - l)(.\z It (l - mQ^z^l - 1))) 

for all zi t (l— 1). 
Finally, we define 

dist(Z™(0), Z n (l), Z n {M)\I x ,h, ...,I M ) 
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so that, for (h,I 2 ,...,I M ) fixed, {Z n {0), Z n {l), . . . , Z n (M)) is a Markov 
chain. 

Note that although r n {l) = dist(Z n (/)|/i, I 2 , ...,//)= r n (l - l)T h , 

dist(Z n (l)\Z n (l-l),I 1 ,...,I l )^T Il . 

Taking average with respect to h^h, ■ ■ ■ ,Im, we get the (unconditional) 
joint distribution 

dist(Z n (0),Z n (l),...,Z n (M)). 

It is easy to see that 

distZ n (0 =p n T l for / = 0,1,...,M. 

We use the notation Y n for Z n (0) and use Z n for Z n (M). 

It is important in the sequel that the Markov kernels Tj, T and T M all 
have q n as invariant measure. 

Note that we could (and do, in fact) consider the infinite Markov chain 
with marginal distributions p n T l , < / < oo, as well. This infinite Markov 
chain is a variant of the so-called Gibbs sampler, which is well known in 
Markov chain simulation. 

In Section 4 we prove that T is a contraction with respect to the Wasser- 
stein distance, which implies that p n T m — > q n as m — > oo, exponentially fast: 

Proposition 2. Assume that the conditional distribution functions Qi(-\xj), 
I El, satisfy the contractivity condition (Definition 1) and that every site 
is covered by at least t patches. Let p n = dist Y n and r n = dist U n be two 
density functions on M n . Then 

W 2 (p n T,r n T) < (l - ^ • W 2 (p n ,r n ). 

Corollary 2. Under the conditions of Proposition 3, 

W 2 (p n T m ,q n )<[l--j -W 2 (p n , q n ) 

for any integer m > 0. 

3. An auxiliary theorem. A basic tool in the proof of Theorem 1 is the 
following theorem, which gives a bound for W(p n ,p n F M ) in terms of the in- 
formational divergence D(p n \\q n ). We hope that it turns out to be interesting 
in its own right. For this auxiliary theorem we do not use the contractivity 
condition. 
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Auxiliary theorem. Assume that for every I €l and every xi, the 
conditional density function Qi(-\xj) satisfies condition (1.5), and that each 
site is covered by at most v patches. Then for any density function p n on 
M. n and for the Markov kernel T, 

W 2 (p n ,p n T M ) < — .y-- D{p n \\q n ) for any M. 
N p 

Remark 2. For the joint distribution dist(Y n ,Z n ), with marginals p n 
and p n T M , yielding W(p n ,p n T M ), 

dist(z n |Y n )/r M , 

in general. 

By the construction of the Markov chain (Y n = Z n (0), Z n (l), . . . , Z n {M) = 
Z n ) we have 

dist Z n =p n T M 

and we use the joint distribution of the Markov chain to estimate W 2 (p n ,p n F M ) 
Clearly, 

W 2 (p n ,p n r M ) < E\Y n - Z n \ 2 . 
First we prove the following lemma. 

Lemma 1. We have 

M M 

E^-Z^K-.v^ElZj^-Zj^l-l)] 2 . 

i=i 

fNote that in this formula the subscripts // are random and the expected 
value takes an average with respect to them, too. J 

Proof of Lemma 1. For a realization of the sequence of patches, say 

(3.1) (h,I 2 ,...,I M ), 

we denote by a the listing of the sites in the patches (3.1), 

M 

a = (h,i2, ...,im,-- l = ^2 1^1' 

l=i 

where \Ii\ denotes the cardinality of // and i m = i 6 [l,n] if 

l-i 

m = ^2\Ij\+r, 0<r<\li\, 
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and the rth site in the patch I\ is just i. Let Vi denote the frequency of i 
in a and let denote the frequency of i in (ii,t2, ■ ■ ■ ,i m )- 
Write 

Cg = \Zi(l) - Zi(l - 1)| 2 , 1 < i < n, 1 < j < v u 

if // is the jth patch in (3.1) that contains the site i. 

It follows from the triangle and the Cauchy-Schwarz inequalities that 

(3.2) E\Y n - Z n \ 2 < EE Pr ^ = k ) ■ k ■ E E iCiM = 

i=l fe j=l 

For j <k we have 

£{c£h = ^} = > j, vi = 

but Qj is conditionally independent of i/j under the condition {z/, > j}. It 
follows that for j < k, 

E{( 2 j \v i = k} = E{( 2 j \v i >j}. 
Thus (3.2) can be continued to 

n 

E\Y n - z n \ 2 < e E E {$j\»i > j) ■ E Pr ^ = fc i • k - 

*=1 J > 1 fc>j 

Furthermore, for any i,j, 

E Pr {^ = ^}^<^i<T7-^ 

whence 

/If n 

(3.3) E\Y» - Z n \ 2 < --v ■ E E > J}- 

i=ii>i 



Put 



~~ Ciji m — 1, 2, . . . , L, 



where (i,j) and m are related as 

(3.4) i = i m , j = tk,m- 

Clearly, whichever choice of (Ii,l2, ■ ■ ■ , Im) is given, for any with j <v- L , 
there is exactly one m, 1 < m < L, that satisfies (3.4) and vice versa. Note 
that here m is a random variable (it depends on I±, . . . , Im)- 
Since v, L > f/,i ;m , we have 

(3.5) > j} = > /ii, m } = E V 2 m . 
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We have 

E^=E<«=EEW)-^(i-i)i a 

M 

= Y,\z h {i)-z Il {i-i)\ 2 . 

1=1 

This, together with (3.3) and (3.5), completes the proof of Lemma 1. □ 

Proof of the auxiliary theorem. By Lemma 1, all we have to prove 
is that 



M r, 

EY}Z Il {l)-Z Il {l-l)\ 2 <--D{p n \\q n ). 



i=i P 



In fact, we prove that, for any realization 

(3.6) h,I 2 ,...,I M 
of the sequence of patches, we have 

(3.7) E(El^(0 " Z h {l - l)| 2 |li,l2, ■ • ■ ,Im) < ~ ■ D(p n \\q n ). 

U=l J P 

The left-hand side of (3.7) can be written as 

M 

Y,E{\Z h {l)-Z h {l-l)\ 2 \I u I 2 ,. ..,/,}. 

Fix the sequence (3.6) and recall from Section 2 the definition 

r n (0=dist(Z n (0|/ 1 ,/ 3 ,...,i'j) s 
according to which r n (l) is obtained from r n (l — 1) by putting 

(3.8) dist(Z Jl (0|«j l ) = Qj I (-|^ I ) 

and leaving unchanged the distribution of the coordinates outside 1\ : 

(3.9) (f(l))i l = (r(l-V)i l - 

It follows from (2.3), (3.8) and (3.9) that the joint conditional distribution 

dist(Z w (Z-l),Z«(Z)|/i,...,l,) 
can be defined in such a way that 

(3.10) E{\Z h {l) - Z h (l - l)| 2 |/i, . • ■ ,/,} < - ■ D(r n (l - l)\\r n (l)). 
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[The distributions r n (l — 1) and r n (l) depend on ii, . . . , Ij.] Indeed, the left- 
hand side of (3.10) can be written as 

E{\Z Il (l)-Z Il (l-l)\ 2 \^(l-l)J 1 J 2 ,...J l }(f(l-l)) Ii (^)dz Il 

<-■ [ ^(z-^oi^a-i^HQ^M^a-i^Cf^-i))^^)^ 

= ~-D(r n (l-l)\\r n (l)). 

The last equality here follows from (3.9). 

Therefore, it is enough to prove that for any choice of Ii, . . . ,Im, 

M 

(3.11) D{p n \\q n )>^2D(r n (l-l)\\r n (l)). 

l=i 

This follows from the identities 

D(p n \\q n ) = D(p n \\r n (l)) + ^(^(1)11^(2)) + • • • 

(3.12) 

+ D(r n (l - l)\\r n (l)) + D(r n (l)\\q n ), 

valid for any I > 1. It is clear that (3.12) for / = M implies (3.11). 
We prove (3.12) by induction on /. Thus first we claim that 

(3.13) D(p n \\q n ) = D(p n \\r n (l)) + D(r n (l)\\q n ), 

which is just (3.12) for I = 1. Indeed, by the well-known decomposition for- 
mula for divergence, 

D(p n \\q n ) = D(p Il \\q h )+ [ \og ^ hl l h \ p n (y n )dy n 

J QiAyiAvh) 

= D(r n (l)\\q n )+D(p n \\r n (l)). 
Now apply (3.13) to r n (l) in the role of p n . This, together with (3.13), yields 

D(p n \\q n ) = D(p n \\r n (l)) + D{r n (l)\\r n (2)) + D(r n (2)\\q n ). 
Iterating this step, (3.12) follows for any I. □ 

4. Proof of Proposition 2. Consider the joint distribution dist(y n , U n ), 
achieving W 2 (p n ,r n ). Let Y n (l) and U n (l) denote random variables with 
density functions p n T and r"T, respectively. 

For a given / G X, we define a joint conditional density function 

dist(Y n ,[/ n ,Y n (l),[/ n (l)|/) 
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as follows. Put Yr(l) = Y 7 , Ui{1) = U h 

(4.1) dist(F / (l)|F" = y n ,I)=Qi(-\yi), 

(4.2) dist(C/ / (l)|C/ n = u n , I) = Qi(-\u!), 
and take for 

dist(YKl), Uj(l)\Y n = y n , U n = u n , I) 
a joining of (4.1) and (4.2) to achieve 

E{\Yj(l) - U!(l)\ 2 \Y n = y n , U n = u n , 1} = W 2 (Q I (-\y I ), Q/(-|«/)). 
We have by (1.3) 

E\Y n {l)-U n (l)\ 2 
1 



^ 4 E E E \ Y k - U k? + W\QMYi)M-\Ui)) 
/ ex Ug/ 

^ 4 E E E \ Y * - u *\ 2 + 4a - 6 ) tE \ Yn - un \ 2 

t 



N 



1 



2V 

ts 

N 



E\Y k - U k \ 2 + -(1 - «5)t£|y» - CTf 
fc=l 



E\Y n -U n \ 2 . 



Proposition 2 is proved. 

5. Proof of Theorem 1. Let M be fixed, and apply Proposition 2 M 
times to the distributions 

p n = disty n and r n = p n T M = dist Z n . 

(We use the notation of Section 2.) We get that 

W 2 (p n T M ,r n T M ) = W 2 (p n T M ,p n T 2M ) 



< 1 



ts 

N 



M 



■W 2 (p n ,r n ) <exp(-t5^\ - W 2 (p n ,r n ), 



that is, 



W(p n T M ,r n T M ) = W{p n T M ,p n T 2M ) < exJ-tS^-j ■ W(p n ,r n ). 



M 



Iterating this step, we get that for any j > 1, 



W(p n T^ M ,p n V M ) < exp(-(j - l)tS^) ■ W(p n ,r n ). 
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Let us define the random sequences Y n (j), j = 0,1, ... , so that 

distY n (j)=p n ri M 

and, for j > 1, 

(5.1) [E\Y n (j) - Y n (j - 1)| 2 ] 1/2 < exp (-(j - 1)^) • W{p n ,r n ). 

We see that {l" n (j)} is a Cauchy sequence in L2 and thus it converges 
in L2 to some random sequence X n . However, we must have distX n = 
q n . Indeed, q n is invariant with respect to T and therefore Proposition 3 
implies that W(p n T^ M ,q n ) — > as j — > 00. Thus we can assume that the 
sequence {Y n (j)} converges to X n in L2. 

By the estimates (5.1), 

[E\Y n - X n \ 2 ] 1/2 < [E\Y n - Z n \ 2 ] l/2 ■ 



1 - exp (-t5(M/(2N))) ' 
By the Auxiliary Theorem, this implies 



[E\Y n - Xf] 1/2 < J ^ -v ■ - ■ D(p»\\q 



N p l[H ' l-exp(-td(M/(2N))) 



V2 l-exp(-t5(M/(2iV))) ^5 t p U[p m h 

Now to complete the proof, it is enough to see that the factor 

y/t6{M/(2N)) 
l-exp{-t6(M/(2N))) 

can be bounded by a numerical constant through an appropriate selection 
of M. Notice that the function 



f(x)= v _ x , x>0, 
1 — e x 

is bounded in any bounded interval that is bounded away from 0. If M varies 
on the integers, then the quantity x = tSM / (2N) changes by steps smaller 
than 1/2. Thus there is a value of M for which x = tSM/(2N) is between 
1 and 3/2, and so 

min m<5M/(2iV)) < max fix). 

M ' y " ~ l<2<3/2 ^ ' 

This completes the proof of Theorem 1. 
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6. Proof of Theorem 2. The Otto-Villani theorem implies condition 
(1.5) of Theorem 1. To prove the contractivity condition, fix two sequences 
x n , y n S W 1 . By (1.5) and the logarithmic Sobolev inequality, we have 

^ 2 (Q/(-|x/),Q/(-|y/))<^-^(g/(-|x / )||Q / (-|y / )) 

2p 



1 



iei 



It follows that 

£^ 2 (Q/(-|x/),Q/(-|y/)) 



lex 



(6.i) <4rE/ Yll d Mvixi)-dMmyi)?Q(m\xi)dvi 



(The integral in the last line is taken over EI/gx^-) 

Now consider, for a fixed vector i] = £ X), the function 



9 = 9 



■n -to™ 



defined by 

gi,i(y n ) = - -dMvm), ieiel. 
p 

Observe that the expression 

\ EE^ $ (w) " d Mvivi)] 2 , 

integrated (with respect to some density function) in the last line of (6.1), is 
nothing else than the squared Euclidean norm of the increment of between 
the points x n and y n . By assumption (1.9) of Theorem 2, the norm of the 
Jacobian of g 7 ? is bounded by (t ■ (1 — 5)) 1 / 2 , so (6.1) implies the contractivity 
condition (1.3). 
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